Research & Occasional Paper Series: CSHE.5.09 



CSHE 



Center for Studies in Higher Education 



UNIVERSITY OF CALIFORNIA, BERKELEY 

http://cshe.berkeley.edu/ 

SERU Project and Consortium Research Paper* 

DECODING LEARNING GAINS 

Measuring Outcomes and the Pivotal Role of the Major and Student Backgrounds 

May 2009 

Gregg Thomson and John Aubrey Douglass 
Copyright 2009 Gregg Thomson and John Aubrey Douglass, all rights reserved. 



ABSTRACT 

Throughout the world, Interest In gauging learning outcomes at all levels of education has grown 
considerably over the past decade. In higher education, measuring “learning outcomes” Is viewed by 
many stakeholders as a relatively new method to judge the “value added” of colleges and universities. 
The potential to accurately measure learning gains Is also viewed as a diagnostic tool for Institutional self- 
improvement. This essay compares the methodology and potential uses of three tools for measuring 
learning outcomes: the Collegiate Learning Assessment (CLA), the National Survey of Student 
Engagement (NSSE), and the University of California’s Undergraduate Experience Survey (UCUES). In 
addition, we examine UCUES 2008 responses of seniors who entered as freshmen on six of the 
educational outcomes self-reports: analytical and critical thinking skills, writing skills, reading and 
comprehension skills, oral presentation skills, quantitative skills, and skills In a particular field of study. 
This Initial analysis shows that campus-wide assessments of learning outcomes are generally not valid 
Indicators of learning outcomes, and that self-reported gains at the level of the major are perhaps the best 
Indicator we have, thus far, for assessing the value-added effects of a student’s academic experience at a 
major research university. UCUES appears the better approach for assessing and reporting learning 
outcomes. This Is because UCUES offers more extensive academic engagement data as well as a much 
wider range of demographic and Institutional data, and therefore an unprecedented opportunity to 
advance our understanding of the nature of self-reported learning outcomes In higher education, and the 
extent to which these reports can contribute as Indirect but valid measures of positive educational 
outcomes. At the same time, the apparent differences In learning outcomes across the undergraduate 
campuses of the University of California without controls for campus differences In composition Illustrates 
some of the limitations of self-reported data. 



In the US and throughout the world. Interest In gauging learning outcomes at all levels of education has 
grown considerably over the past decade. In higher education, “learning outcomes” are viewed by many 
stakeholders. Including lawmakers and advocates of new and more expansive accountability regimes, as 
a method to measure the value added, and In some sense the quality and effectiveness, of colleges and 
universities. But perhaps most Importantly, collecting and making public more and better assessment 
data about how and what students learn offer an Important and relatively new means for Institutional self- 
improvement. 



•The SERU Project and Consortium is a collaborative of 15 major research universities based at the Center for Studies in Higher Education at 
UC Berkeley and including the administration of the SERU survey of undergraduates. 

•* Gregg Thomson is Executive Director of the Office of Student Research and Campus Surveys at UC Berkeley; j ohn Aubrey Douglass is a 
Senior Research Fellow at the Center for Studies in Higher Education; both are co-PI's on the SERU Project and Consortium, Initial analysis 
and report preparation was conducted by Preeta Saxena of the UC Riverside Survey Research Center, under the direction of Steven Brintand 
David Crow, Associate Director of the UC Riverside Survey Research Center. Thanks to colleagues David Radwin, Steve Chatman, Cynthia 
Schrager, Elizabeth Berkes, and Dennis Hengstler for their insights regarding the use of SERU/UCUES data. 






Thomson and Douglass: DECODING LEARNING GAINS 



2 



In 2005, Secretary of Education Margaret Spellings convened a speciai commission to focus on how to 
make higher education institutions more accountabie in iight of rising pubiic and private funding and 
investment in American coiieges and universities. Refiecting to some degree the structurai approach of 
the “No Chiid Left Behind” iegisiation that focused on reform in K-12 education, the “Speiiings 
Commission” advocating the buiiding of a simiiar and extensive iearning assessment program in U.S. 
higher education. In its finai September 2006 report, the commission imagined two routes for greater 
accountabiiity: 

• The deveiopment and wide use of some sort of standardized test to measure vaiue added 

• New federai guideiines for the nation’s network of accrediting bodies to heip deveiop nationai 
standards and comparative review of institutionai performance 

An institution shouid “gather evidence about how weii students in various programs are achieving iearning 
goais across the curricuium and about the abiiity of its graduates to succeed in a chaiienging and rapidiy 
changing worid,” stated the report, “and the information shouid be used, as it historicaiiy has been, to heip 
the institutions figure out how best to improve their performance” (Speiiings Commission 2006).^ 

Aithough without significant authority over state higher education systems, the federai commission 
heightened an ongoing debate over the ideai of measuring iearning outcomes. There was aiso debate 
over the appropriate use of such data - for exampie, as a means for identifying poor institutionai 
performers, for conditioning federai and state funding, and for informing potentiai students and their 
famiiies. 

The caii for added accountabiiity, the emphasis on testing, and the refocusing of the voiuntary nationai 
accreditation system have had the beneficiai effect of increasing the higher education community’s 
attention to more systematicaiiy evaiuating teaching and iearning. On the heeis of the Speiiing 
Commission, the Nationai Association of State Universities and Land-Grant Coiieges and the American 
Association of State Coiieges and Universities coiiaborated to create a Voiuntary System of 
Accountabiiity (VSA) that requires participating institutions to report iearning outcomes using one of three 
competing standardized tests of undergraduate “higher order skiiis”: the Coiiegiate Assessment of 
Academic Proficiency (from ACT), the Measure of Academic Proficiency and Progress (from the 
Educationai Testing Service), and the Coiiegiate Learning Assessment (CLA). 

The notion that standardized testing is the appropriate way to assess iearning outcomes at the university 
ievei has not been universaiiy accepted, however. In fact, in 2007 the University of Caiifornia expiicitiy 
rejected this component of the Voiuntary System of Accountabiiity, noting that “using standardized tests 
on an institutionai ievei as measures of student iearning faiis to recognize the diversity, breadth, and 
depth of discipiine-specific knowiedge and iearning that takes piace in coiieges and universities today. 

In 2008 the Consortium on Financing Higher Education (COFHE) reieased its statement on assessment 
in which it firmiy rejected standardized testing: 

Based on our experience, we are skepticai about efforts to make this kind of assessment through 
standardized tests, inciuding those that purport to measure criticai reasoning. ... [Ajssessment 
experts are far from agreement about whether "vaiue added" can be measured accurateiy across 
diverse institutions. ... [W]e do not endorse any approach that depends soieiy on a singie 
standardized measure or even a singie set of standardized measures. (COFHE 2008) 

In addition to the COFHE membership of 31 ieading private coiieges and universities, the statement on 
assessment was endorsed by dozens of others, inciuding the University of Caiifornia, Berkeiey. Ironicaiiy, 
by eariy 2008, Secretary Speiiings herseif apparentiy no ionger heid the view that the one-measure-fits- 
aii-institutions approach advocated by the Speiiings Commission was appropriate. “Aii coiieges shouid be 
aiiowed to describe their own unique missions," she stated before the Nationai Press Ciub, “and be 
judged against that.” She went on to say, "That is totaiiy within the jurisdiction of each institution.”^ 

Regardiess of the opposition to standardized testing to assess iearning outcomes, the imperative to 




Thomson and Douglass: DECODING LEARNING GAINS 



3 



measure and report on student learning outcomes for accreditation and pubiic accountabiiity remains 
strong. The University of Caiifornia, for exampie, is impiementing a comprehensive Accountabiiity 
Framework, and it is expected that students’ seif-reported measures of iearning wiii be inciuded using 
data from the University of Caiifornia Undergraduate Experience Survey - a survey deveioped by the UC 
community as part of the Student Experience in the Research University Project (henceforth referred to 
asSERU/UCUES). 

But what is the best approach for assessing student iearning outcomes? 

This paper discusses three possibie means for measuring iearning outcomes for major research 
universities (inciuding the University of Caiifornia) and their strengths and weaknesses: 

• The Coiiegiate Learning Assessment (CLA), which has emerged as the most visibie of the 
standardized tests of student iearning; 

• The nationaiiy prominent Nationai Survey of Student Engagement (NSSE); and 

• The SERU/UCUES survey currentiy used by the University of Caiifornia and, more recentiy, at a 
number of other AAU pubiic research universities."^ 

We aiso provide in this essay an expioration of SERU/UCUES seif-reported iearning gains. Previous 
research with SERU/UCUES data has documented both the striking demographic diversity of the 
undergraduate student body (Dougiass, Roebken & Thomson, 2007; Dougiass & Thomson, 2008) and 
the significant differences in student experience by fieid of study at the University of Caiifornia (Chatman, 
2007; Brint, Cantweii & Hanneman, 2008). Therefore, a major consideration in determining the best 
approach to measuring iearning outcomes (more broadiy, educationai outcomes) at the University of 
Caiifornia is that it shouid be both cost-effective and up to the task of addressing this demographic and 
discipiinary compiexity. 

Moreover, whiie some view the criteria for choosing how to measure iearning outcomes in terms of the 
abiiity to assess the reiative “vaiue added” of institutions, our view is that decisions about the coiiection of 
student iearning data shouid aiso be guided by the potentiai of this assessment to encourage institutionai 
seif-improvement. 

A. CLA — The Negative Value of “Value Added” 

Aithough the University of Caiifornia initiaiiy rejected the use of standardized tests to assess iearning 
outcomes, the increasing prominence and use of the Coiiegiate Learning Assessment by other coiieges 
and universities (inciuding, for exampie, the University of Texas system) suggests that the CLA is 
certainiy worth a ciose iook. 

Deveioped by the Councii for Aid for Education, the Coiiegiate Learning Assessment (CLA) offers a 
carefuiiy deveioped written test that focuses on criticai thinking, anaiytic reasoning, written 
communication, and probiem soiving that is administered to smaii random sampies of freshmen in the faii 
and seniors in the spring. Test resuits derived from these sampies provide an institution-wide measure of 
the institution’s contribution or vaiue-added to the deveiopment of its students’ cognitive competencies or 
iearning. Institutions can then be compared on the basis of their reiative vaiue-added performance. 

The vaiue of the CLA derives from two weii-articuiated principies. 

• First, for accountabiiity purposes, vaiid assessment of iearning outcomes for students at an institution 
is oniy possibie by rigorousiy controiiing for the characteristics of those students at matricuiation 
(Kiein et ai., 2005; Kiein, Benjamin & Shaveison, 2007). 

• Second, by using SAT scores as the controi for initiai student characteristics, given how weii the CLA 
tests have been designed and vaiidated as measures of generai cognitive skiiis, it is possibie on the 
basis of surprisingiy smaii sampies to caicuiate the difference between freshman and senior test 
performance and compare that difference to that predicted or expected on the basis of student 
characteristics at entry. 




Thomson and Douglass: DECODING LEARNING GAINS 



4 



• Third, this reiative performance or vaiue-added can in 
turn be compared to the reiative performance or vaiue- 
added achieved at other institutions, hence providing 
the most vaiid or fair comparison of how weii a coiiege 
is performing in terms of student iearning (Kiein, 

Benjamin & Shaveison, 2007; Kiein et ai., 2008). 

Banta (2006, 2007, 2009) as weii as other prominent higher 
education researchers (e.g.. Pike, 2006) have questioned the CLA enterprise on a number of grounds. 
For one, the CLA and the SAT are so highiy correiated that the amount of variance in student iearning 
outcomes to be accounted for after controiiing for SAT scores is incredibiy smaii and most institutions wiii 
simpiy be in the expected range. The resuits are aiso sampie-dependent in ways not recognized by CLA 
(for exampie, student motivation). Finaiiy, the design that compares the test performance of a sampie of 
freshmen and a sampie of seniors cannot isoiate institutionai vaiue-added from other characteristics of 
institutions and their students that affect student iearning, but have nothing directiy to do with the 
instructionai quaiity and effectiveness of an institution. 

Other criticisms center on the assumption that the CLA has fashioned tests of agreed-upon generai 
cognitive skiiis that are reievant to aii students (Pike, 2006), but recent findings (Arum & Roska, 2008) 
suggest that CLA resuits are, to some extent, discipiine-specific. Because of the cost and difficuity of 
evaiuating individuai student essays, the design of the CLA requires a rather smaii sampie size (often 250 
to 300 students) and thereby generates generaiities about overaii institutionai effectiveness. There is very 
iittie if any usefui information at the ievei of the major. The CLA might generate meaningfui data in a smaii 
iiberai arts coiiege, but it appears of very iimited use in iarge and compiex universities. 

To veterans in the higher education research community, the “history iessons” of eariier attempts to rank 
institutions on the basis of “vaiue-added” measures are particuiariy teiiing. There is evidence that aii 
previous attempts at iarge-scaie or campus-wide assessment in higher education on the basis of vaiue- 
added measures have coiiapsed, in part due to the observed instabiiity of change measures (Adeiman, 
2006; Banta, 2006, 2007; Pike, 2006). 

The CLA response attempts to demonstrate statisticaiiy that much of this criticism does not appiy to the 
CLA; For exampie, regardiess of the amount of variance accounted for, the tightiy SAT-controiied design 
does aiiow for the extraction of vaiid resuits regardiess of the vagaries of specific sampies or student 
motivation (Kiein, Benjamin & Shaveison, 2007; Kiein et ai., 2008). But uitimateiy even if the proponents 
of the CLA are right and their smaii-sampie testing program with appropriate statisticai controis couid 
produce a reiiabie and vaiid “vaiue-added” institutionai score, this does not mean that it is appropriate for 
the University of Caiifornia to commit its resources to this enterprise. 

There are at ieast three reasons for rejecting the impiementation of the CLA for institutionai 
“accountabiiity” at the University of Caiifornia regardiess of what (or who) one beiieves regarding the 
arguments for its vaiidity. 

First, the CLA ciaims that, in addition to providing an institution-wide “vaiue-added” score, it serves as a 
diagnostic tooi designed “to assist facuity in improving teaching and iearning, in particuiar as a means 
toward strengthening higher order skiiis.” But this is a preposterous proposition for a large, complex 
research university iike the University of Caiifornia. 

Exactiy how wouid the statisticaiiy derived resuit (on the basis of a few hundred freshman and senior test- 
takers) or news that, for exampie, the Berkeiey campus was performing more pooriy than expected (or 
reiativeiy more pooriy than, say, the Santa Barbara campus) assist the Berkeiey facuity in improving its 
teaching and iearning? In reaiity, this news wouid sureiy generate “more heat than iight” and couid offer 
no guidance whatsoever in terms of institutionai seif-improvement. 

Second, any approach to the assessment of student iearning at the University of Caiifornia that provides 
no abiiity to examine how weii the university is doing in regard to its student popuiations from various 
backgrounds and iife circumstances is incompatibie with its core vaiue of diversity and access. 



... the CLA and the SAT are so highly 
correlated that the amount of variance in 
student learning outcomes to be accounted 
for after controlling for SAT scores is 
incredibly small and most institutions will 
simply be in the expected range. 





Thomson and Douglass: DECODING LEARNING GAINS 



5 



Finally, embarking on a “Hoiy Graii-iike” quest for a vaiid “vaiue-added” measure is, of course, a 
fundamentai vaiue-choice. Ironicaiiy, the more the CLA enterprise insists that the oniy thing that reaiiy 
matters for vaiid accountabiiity in higher education is a statisticai test of “vaiue-added” by which 
universities can be scored and ranked, the more the CLA iacks a broader vaiidity, nameiy, what Braun 
identifies as “systemic vaiidity”: 

Assessment practices and systems of accountabiiity are systemicaiiy vaiid if they generate usefui 
information and constructive responses that support one or more poiicy goais (Access, Quaiity, 
Equity, Efficiency) within an education system without causing undue deterioration with respect to 
other goais. (2008) 

“Vaiid” or not, the successfui promotion of a narrow standardized test “vaiue-added” program of 
assessment in higher education promises iittie in the way of “usefui information and constructive 
responses” whiie threatening “undue deterioration” eisewhere. Such a ranking system couid oniy have 
decidediy pernicious effects, as Adeiman (2006) observes. 

In Lee Shuiman's terms, the CLA is a “high stakes/iow 
yieid” strategy where high stakes corrupt the very 
processes they are intended to support (2007). 

For the purposes of institution-wide assessment, then, we 
surmise that the net vaiue of CLA’s vaiue-added scheme 
wouid be more negative than positive. 



B. Student Self-Reports of Learning Gains on NSSE 

Over the iast decade or so, the Nationai Survey of Student Engagement (NSSE) has grown remarkabiy in 
its use among a great variety of higher education institutions, aithough most predominateiy in iiberai arts 
coiieges. The most recent annuai report notes: 

Like the speaker who “needs no introduction,” NSSE may weii have achieved an eminence that 
needs no foreword. The acronym is everywhere: on institutionai Web sites and the iips of parents and 
students seiecting a coiiege; the pages of USA TODAY, the Chronicle of Higher Education, Change 
magazine, and the New York Times; the 2006 report from the Nationai Commission on the Future of 
Higher Education, and now on the tempiate for the Voiuntary System of Accountabiiity. ... In fact, go 
to Google and you’ll find “about 299,000” entries that deal with NSSE. (National Survey of Student 
Engagement, 2008b) 

Established a decade ago and promoted as a constructive alternative to invidious college ranking 
schemes (especially the US News and World Report rankings), NSSE is the obvious source for useful 
survey-based information on undergraduate learning outcomes. Or so it would seem. 

Thirty years ago Pace (1979, 1984) initiated the systematic collection of undergraduate student 
experience data with the College Student Experience Questionnaire (CSEQ), and it included items on 
self-reported educational outcomes. In 1999 Pace’s format for these items was incorporated verbatim 
into the National Survey of Student Engagement (NSSE) and ten years later remains the basis for a 16- 
item section on NSSE. The question reads, “To what extent has your experience at this institution 
contributed to your knowledge, skills, and personal development in the following areas?” The possible 
answers are “Very much,” “Quite a bit,” “Some,” and “Very little.” 

Asking about educational outcomes in the way NSSE does, that is, without reference to either a 
beginning point or other standard and with vague response categories, is fundamentally flawed. 
Responses are subject to very significant “halo” effects (Pike, 1999; Wells, 1907) and are not valid 
indicators of actual learning or educational gains (Gonyea, 2005; Pascarella, 2001 ). 

Researchers using variations of the NSSE approach have failed to find any valid relationship between 
student self-reports of learning gains measured this way and actual gains. Pike (1993) used only three 
response options and, more recently. Bowman (2009), whose study using four response options, found 
no correlation between self-reported gains and independently measured growth in the freshman year. 
(Interestingly, the study by Anaya (1999) that did find that self-reports were a valid measure of learning 



“Valid" or not, the successful promotion of a 
narrow standardized test “value-added” 
program of assessment in higher education 
promises little in the way of “useful 
information and constructive responses” 
while threatening “undue deterioration” 
elsewhere. 





Thomson and Douglass: DECODING LEARNING GAINS 



6 



used an explicit five-point scaie of change in skiiis: (1) much weaker, (2) weaker, (3) no change, (4) 
stronger, and (5) much stronger.) 

What is stunning, however, is that a fuii decade of NSSE activity has produced no research that 
estabiishes the vaiidity of the NSSE approach for measuring gains in iearning or other educationai 
outcomes. Two eariy iocai studies (Beicheir, 2001, 2003) produced confusing and iargeiy uninterpretabie 
resuits, and a more recent iarger and methodoiogicaiiy sophisticated study (Carini, Kuh & Kiein, 2006) 
found no reiationships between senior seif-reports of educationai gains and various independent 
measures. 

In addition to the iack of demonstrated vaiidity, the NSSE educationai outcomes resuits are probiematic 
on even the intuitive or descriptive ievei. The NSSE freshman-senior sampie design shouid, one wouid 
expect, showcase significant iearning gains by comparing the freshman and senior resuits. However, 
this is not the case at aii. For freshmen on the 2008 NSSE the average across the 16 educationai 
outcomes items reporting gains of “Quite a bit” or “Very much” is 63%; for seniors the figure is 65% 
(Nationai Survey of Student Engagement, 2008b). 

In other words, using the NSSE-styie items, it appears 
superficiaiiy that there are significant educationai gains made 
in the freshman year and then aimost no additionai gains 
over the next three years! Not exactly a case for positive 
learning outcomes or institutional “value-added.” Finally, there is no mention of the 16-item educational 
outcomes section of NSSE or any of the items from it in the most recent comprehensive 50-page report 
on the 2008 NSSE results (National Survey of Student Engagement, 2008b). 

Given the prominence of the NSSE enterprise, its decade-long adherence to a fundamentally flawed 
approach constitutes a glaring gap (literally, in its annual reports) and missed opportunity for our 
understanding of the nature of learning and other educational outcomes in higher education. Because of 
its freshman-senior sample design (as opposed to the SERU/UCUES census approach), NSSE would 
probably not provide the scope of learning outcomes data needed by the University of California even 
with the most valid survey items. But lacking valid measures of learning outcomes entirely, NSSE, thus 
far designed, clearly “fails the test.” 

C. Why SERU/UCUES May Succeed 

The major research universities that are part of the Voluntary System of Accountability have identified 
SERU/UCUES as one of four nationally recognized surveys for institutional accountability.^ The 
overwhelming value of SERU/UCUES for the comprehensive informational needs of the University of 
California - and by extension other large-scale research universities - is its census (plus module) design 
that provides data down to the level of individual academic program and student subpopulations of 
interest. 

In terms of assessing learning outcomes specifically, however, SERU/UCUES also offers an innovative 
approach that sets it apart from conventional undergraduate surveys. This approach is drawn from the 
field of program evaluation where 30 years ago the work of Howard (1980; Howard et al., 1979; Howard & 
Dailey, 1979) and then others challenged the conventional wisdom that the most valid way to measure 
program effects or gains is to use a pretest-posttest design. Because of what Howard identified as 
“response-shift bias”, program participants are likely to have a more informed frame of reference as a 
consequence of their experience in the program, often making posttest evaluations of their proficiencies 
or knowledge both lower and more accurate than their pretest evaluations. 

With this insight, assessment of program or treatment effects did not need to rely as heavily on the more 
costly pretest-posttest evaluation design and could often substitute the retrospective posttest design. 
More recent research, however, has concluded that earlier views of the magnitude of response-shift bias 
were exaggerated (Wilson & Lipsey, 2001), and that the retrospective posttest design may produce 
ratings that are more biased than prospective ratings (Hill & Betz, 2005; Taylor, Russ-Eft, & Taylor, 2009). 
Not surprisingly, inflating the difference between retrospective and current self-ratings is associated with 
social desirability (Hill & Betz, 2005). Retrospective pretest ratings may be lowered because of 
motivational or systematic cognitive bias such as self-enhancement, implicit theory of change, and effort 
justification (Taylor, Russ-Eft, & Taylor, 2009). 



In addition to the lack of demonstrated 
validity, the NSSE educational outcomes 
results are problematic on even the 
intuitive or descriptive level. 





Thomson and Douglass: DECODING LEARNING GAINS 



7 



On the other hand, when comparing the retrospective pretest method with the perceived change method 
(simiiar to the NSSE approach) and the post pius perceived change method with teachers reporting 
change in instructionai practices, Lam & Bengo (2003) found that the retrospective pretest method 
produced the ieast “satisficing” (Krosnick, 1991) and responses on the basis of sociai desirabiiity, whiie 
the perceived change method produced the most. 

Severai observations and generaiizations emerge from the practice of the retrospective pretest method in 
program evaiuation and other fieids. The method is especiaiiy usefui if capturing accurateiy how change 
is experienced subjectiveiy by program participants is reievant. Where what is being rated is saiient to 
the participants’ sense of seif, the “then” and “now” method may be more appropriate despite the obvious 
heightened sociai desirabiiity bias. Finaiiy, if the costs for overestimating program effects are not great, 
the advantages of using this approach can offset the potentiai biases. 

Aii of these conditions wouid seem to appiy to any iarge-scaie effort to assess and report iearning 
outcomes in higher education. A method that aiiows us to capture accurateiy how different popuiations of 
students (e.g., students in different majors) characterize their own iearning gains at the University of 
Caiifornia and under what conditions shouid contribute 
considerabiy to our potentiai understanding of the 
compiexities of iearning outcomes. 

And given the poiiticai reaiities of accountabiiity, an 
institution’s entireiy transparent though favorabiy biased 
presentation of iearning gains as reported by students 
themseives sureiy has iess potentiai downside than the 
possibiiity of coming out on the short end of a perhaps 
unstabie and certainiy opaque (for the pubiic) “vaiue-added” 
ratings scheme such as the CLA. 



Granted, student seif-reports of iearning are oniy indirect indicators and are cieariy favorabiy biased ones 
at that. On the other hand, the iarge-scaie census design aiiows us to amass tremendous amounts of 
iearning and other seif-reported educationai outcomes data at a fraction of the cost of any other method, 
thereby providing an opportunity to conduct anaiyses to determine how seif-reports are affected by 
“infiationary bias” and the extent (or not) that they are usefui in vaiidating and reporting iearning 
outcomes. 

Therefore, in 2004 the originai SERU/UCUES survey instrument inciuded the retrospective pretest or 
“then” and “now” seif-report methodoiogy to measure educationai outcomes for University of Caiifornia 
undergraduates rather than adopting the rating of improvement approach used by NSSE and CSEQ. 
Initiaiiy, for 14 educationai outcomes, students were asked to assess their skiiis and proficiencies on a 
seven-point scaie (Very Poor, Poor, Fair, Good, Very Good, Exceiient, Expert), both when they started at 
the University of Caiifornia and currentiy. 

More recent versions of SERU/UCUES use a six-point scaie (Very Poor, Poor, Fair, Good, Very Good, 
Exceiient), and the iatest instrument has ratings of 21 different educationai outcomes (and students 
assigned to the Student Deveiopment moduie rate an additionai six outcomes). 

Given the SERU/UCUES census design, therefore, we now have an incredibiy rich set of student 
retrospective pretest or “then” and “now” seif-assessment data. We can examine seif-reported 
educationai outcomes across a iarge number of domains for students at every point of their academic 
careers, across and within different fieids of study and for any number of student popuiations. Having 
retrospective pretest items across so many different content areas gives us the abiiity to heip assess and 
controi for the tendency to exhibit improvement biases. 

An eariier study (Thomson, 2006) examined the SERU/UCUES educationai gains data for University of 
Caiifornia, Berkeiey freshmen, sophomores, juniors, and seniors and found that the student seif-reports 
demonstrated ciear evidence of response-shift bias and/or seif-enhancement bias: With each year in 
schooi, ratings of “when you started’ were iower. 



... a large-scale census design allows us 
to amass tremendous amounts of learning 
and other educational outcomes data at a 
fraction of the cost of any other method, 
thereby providing an opportunity to conduct 
analyses to determine how self-reports are 
affected by “inflationary bias” and the 
extent (or not) that they are useful in 
validating and reporting learning outcomes. 





Thomson and Douglass: DECODING LEARNING GAINS 



8 



But the results also showed a high degree of stability and “reasonabieness” in that reported gains were 
modest and seiective (i.e., respondents did not report gains in aii areas). Most importantiy, the magnitude 
of reported gains for juniors and seniors differed by domain and fieid of study, suggesting that the student 
seif-reports, though biased upward, did appear to refiect different patterns of iearning. 

D. The Current Study 

The research reported here examines the responses of seniors who entered as freshmen on six of the 
educationai outcomes seif-reports on SERU/UCUES 2008; 
anaiyticai and criticai thinking skiiis, writing skiiis, reading and 
comprehension skiiis, orai presentation skiiis, quantitative skiiis, 
and skiiis in a particuiar fieid of study. Omitting respondents 
with missing data on UC GPA, gender, race/ethnicity, 
immigrant generation, or major, the study has about 12,500 
sets of responses. 



Tabie 1 shows how University of Caiifornia seniors assess their 
iearning gains in each of the six areas. Whiie seniors are more 
iikeiy to rate themseives as proficient currentiy than when they 
began at the university in aii six areas, what is noteworthy is 
how the magnitude of the gains varies across the six areas. 

There is iess gain reported for quantitative skiiis in particuiar, 
which makes sense given that the majority of students major in 
non-quantitative-based fieids. At the other extreme, seif- 
reported gains are highest for knowiedge of a specific fieid of 
study; that is, an area that cuts across aii majors. These 
resuits, then, seem to have credibie face-vaiidity. 

If the SERU/UCUES senior seif-reports of iearning gains have 
vaiidity, then we shouid observe a reiationship with another 
assumed measure of iearning, nameiy coiiege GPA. The 
reiationship between student seif-reports and overaii 
cumuiative UC GPA is examined in Tabie 2. 

This way in which the reiationships of UC GPA and student 
seif-reports vary by skiii area does suggest that student 
assessments using the SERU/UCUES approach have some 
degree of vaiidity as indicators of iearning outcomes. 

Specificaiiy, the reiationship is weakest for quantitative skiiis (gains are actuaiiy the iowest for the highest 
GPA students) and orai presentation skiiis, skiii areas iess uniformiy reiated to academic achievement 
across aii majors. 

Converseiy, it is strongest for criticai and anaiyticai thinking and fieid of study. As shown in Tabie 3, 
senior seif-reports are aiso reiated significantiy to student demographics and fieid of study. 



Table 1, PercentRating Skills as "Very Good" or "Excellent" 
Across Six Domains 





Began 


Now 


Gain 


QUANirrAUVE SKILLS 


28P/o 


39% 


-ai% 


ORAL PRESENTATION 


ISP/o 


56% 


-138% 


VtfVTING CLEARLY 


24% 


62% 


-(38% 


READNGACADEIVIC 


22% 


71% 


-(49% 


CRITICAL IHIKING 


24% 


76% 


-(52% 


HELD OF STUTT 


0% 


76% 


-(-70% 



Table 2, PercentRating Skills as "Very Good" 
or "Excellent' Across Six Domains by Current 
Cumulative UC G PA Category 



Quandlative 


Began 


Now 


Gain 


Under2.8 


23% 


35% 


-112% 


2.8-3.19 


23% 


36% 


-113% 


3.2-3.59 


26% 


36% 


-110% 


3.6 Si (iig(ier 


34% 


41% 


-16% 


Oral PresenHdon 


Began 


Now 


Gain 


Under2.8 


18% 


53% 


-134% 


2.8-3.19 


17% 


51% 


-135% 


3.2-3.59 


17% 


55% 


-138% 


3.6 Si (igtier 


19% 


57% 


-138% 


VUiting 


Began 


Now 


Gain 


Under2.8 


19% 


53% 


-134% 


2.8-3.19 


20% 


55% 


-136% 


3.2-3.59 


23% 


61% 


-138% 


3.6 Si (iig(ier 


29% 


69% 


-139% 


Reading 


Began 


Now 


Gain 


Under2.8 


19% 


59% 


-139% 


2.8-3.19 


19% 


62% 


-143% 


3.2-3.59 


21% 


70% 


-149% 


3.6 Si (igtier 


25% 


77% 


-t52% 


Critical THnking 


Began 


Now 


Gain 


Under2.8 


19% 


63% 


-144% 


2.8-3.19 


20% 


67% 


-148% 


3.2-3.59 


23% 


75% 


-t52% 


3.6 Si (igtier 


29% 


82% 


-t53% 


Field of StuV 


Began 


Now 


Gain 


Under2.8 


6% 


62% 


-t56% 


2.8-3.19 


6% 


70% 


-t64% 


3.2-3.59 


6% 


76% 


-170% 


3.6 Si (igtier 


7% 


82% 


-175% 





Thomson and Douglass: DECODING LEARNING GAINS 



9 



Our first look at the results indicates that a muitipiicity of factors may contribute to student seif-ratings 
when using the retrospective pretest method. However, because these factors are to a substantiai 
degree interreiated, we next examined the effects of the factors in combination. To do this, student 
responses were anaiyzed using a2X2X2X2X2 design: 



• UC GPA <3.2 versus UCGPA>= 3.2 

• Science (STEM discipiines) versus 
non-science 

• Immigrant (both parents not born in 
US) versus non-immigrant 

• Maie versus femaie 

• Asian versus non-Asian 

This design yieids 32 separate 
combinations or 16 “controiied” 
comparisons for each of the five factors. 

For exampie, in examining the reiationship 
of UC GPA to seif-ratings we compared 
the ratings of maie immigrant Asian 
science respondents in the two GPA 
categories, the ratings of femaie 

immigrant Asian science respondents in 
the two GPA categories, and so forth. 

Unweighted averages for the 16 
comparisons for each of the five factors 
across the six skiii domains are shown in 
Tabie 4. 

The 2X2X2X2X2 anaiysis suggests 
that UC GPA, fieid of study, and ethnicity 
are aii associated with substantiai 
differences in student seif-ratings of 
educationai outcomes even after 

controlling for other factors. 

Gains, after controiiing for other factors, 
by UC GPA are greater for fieid of study 
and reading academic materiai; for fieid of 
study the greater gains by science 

students in quantitative skiiis are offset by 

equaiiy greater gains by non-science 
students in writing cieariy; and for 
ethnicity, Asian student percentage gains 
are doubie-digits iess for aii areas except quantitative skiiis. Immigrant generation has modest effects 
and, with the exception of quantitative skiiis, there are no differences by gender. 

To appreciate the magnitude of the combined effect of UC GPA, fieid of study, and ethnicity on seif- 
ratings, three-way crosstabs were run for current skiii ratings for each of the six domains. As can be seen 
in Tabie 5, the joint effects of UC GPA, fieid of study, and ethnicity on the proficiency ratings of University 
of Caiifornia seniors can be quite dramatic. For exampie, for “writing cieariy and effectiveiy” the range is 
from 43% to 80% rating themseives as “Very Good” or “Exceiient”. The different reiative magnitude of 
each of the factors across different skiii domains in ways that “make sense” is aiso worth noting. 

Our approach here, of course, underestimates the fuii impact of various factors on senior seif-ratings. For 
exampie, certain fieids of study (e.g., engineering, within science) and more differentiation with UC GPA 
wouid yieid more extreme differences. The fact that Asian students rate themseives iower even after 
controiiing, at ieast broadiy, for other factors is, of course, very intriguing and may be our first hint of 
important cuiturai differences in how bias affects seif-ratings of iearning gains. As shown in Tabie 5, 



Table 3, Senior Self-Reports by Ethnicity, Demographics and Field of Study 


ETHicmr 


Quant 


Oral 


vmting Reading 


Thinking 


Began 


Asian 29% 


17% 


20% 


19% 


20% 




Biack 21% 


22% 


2 5% 


30% 


24% 




Latino 20% 


20% 


21% 


24% 


21% 




White 28% 


25% 


35% 


32% 


35% 


Now 


Asian 39% 


46% 


47% 


56% 


60% 




Biack 33% 


65% 


70% 


78% 


82% 




Latino 33Si 


61% 


64% 


75% 


7 8% 




White 39% 


5 6% 


71% 


79% 


84% 


Gains 


Asian +10 


+29 


+27 


+37 


+40 




Biack +12 


+43 


+45 


+48 


+58 




Latino +13 


+41 


+43 


+51 


+57 




White +11 


+31 


+36 


+47 


+49 


IMMIGRATION 


Quant 


Oral VUitin 


Rearing 


Thinking 


Began 


StudentNotBorn in U S 


2 8% 


18% 17% 


20% 


20% 




Parent(s) Not Born in US 


2 5% 


22% 22% 


24% 


22% 




Both Parents Born in US 


2 6% 


27% 34% 


34% 


36% 


Now 


StudentNotBorn in U S 


41% 


42% 43% 


54% 


5 6% 




Parent(s) Not Born in U S 


36% 


44% 51% 


5 8% 


5 9% 




Both Parents Born in US 


36% 


50% 66% 


72% 


7 8% 


Gains 


StudentNotBorn in U S 


+ 13 


+24 +26 


+34 


+36 




Parent(s) Not Born in U S 


+ 11 


+22 +29 


+34 


+37 




Both Parents Born in US 


+ 10 


+23 +32 


+38 


+42 


HELD OF STUDY 


Quant 


Oral V\Htin! Rearing 


Thinking 


Began 


Engineering, Math, Science 


39% 


17% 26% 


23% 


31% 




Bioiogicai Sciences 


34% 


19% 25% 


23% 


25% 




Sociai Sciences 


22% 


22% 25% 


26% 


26% 




Humanities 


18% 


26% 33% 


33% 


31% 


Now 


Engineering, Math, Science 


7 4% 


52% 44% 


5 9% 


63% 




Bioiogicai Sciences 


49% 


48% 50% 


64% 


59% 




Sociai Sciences 


28% 


53% 65% 


7 0% 


68% 




Humanities 


14% 


54% 75% 


77% 


73% 


Gains 


Engineering, Math, Science 


+35 


+35 +18 


+36 


+32 




Bioiogicai Sciences 


+15 


+29 +25 


+31 


+34 




Sociai Sciences 


+06 


+31 +40 


+44 


+42 




Humanities 


-03 


+28 +42 


+44 


+42 





Thomson and Douglass: DECODING LEARNING GAINS 



10 



these SERU/UCES results give us an initial appreciation of the reguiarities and patterns in retrospective 
pretest data and perhaps some of the compiexity its use wiii entaii. 

E. Campus Differences and Accountability 

The resuits presented here represent the 12,500 seniors who entered as freshmen across the University 
of Caiifornia, that is, without reference to individuai campuses. There is perhaps a naturai curiosity to 
compare the senior seif-reports of educationai outcomes across campuses, and for pubiic accountabiiity, 
perhaps even an imperative to do so. 



Table 4, Percent Rating Skills as "Very Good" or "Excellent' by One 
Factor When Controlling for Other Four Factors (Unweighted Average of 
Sixteen Comparisons) 



UC GPA 




Quant 


Oral 


Witinqi 


Reading 


TWnIdng 


Field 


Began 


GPA < 3.2 


25% 


17% 


21% 


21% 


20% 


6% 




GPA >=3.2 


33% 


17% 


25% 


22% 


27% 


6% 


Now 


GPA < 3.2 


40% 


53% 


57% 


64% 


69% 


70% 




GPA >=3.2 


45% 


56% 


62% 


73% 


78% 


80% 


Gain 


GPA < 3.2 


+15% 


+36% 


+36% 


+43% 


+49% 


+64% 




GPA >=3.2 


+12% 


+39% 


+37% 


+51% 


+52% 


+73% 


Qfl^aiceinGain! 


-3% 


3% 


2% 


8% 


3% 


10% 


HELD OF STUDY 


Quant 


Oral 


Witing 


Reading 


Thinking 


Field 


Began 


Science 


34% 


16% 


25% 


21% 


26% 


6% 




Not Science 


24% 


18% 


22% 


21% 


21% 


6% 


Now 


Science 


58% 


54% 


50% 


65% 


71% 


75% 




Not Science 


27% 


5 5% 


69% 


72% 


77% 


75% 


Gain 


Science 


+24% 


+38% 


+26% 


+44% 


+45% 


+68% 




Not Science 


+3% 


+37% 


+47% 


+50% 


+56% 


+69% 


DiflaHicein Gan 


-21% 


-1% 


22% 


6% 


10% 


0% 


FTWJCrTY 




Quant 


Oral 


Witing 


Reading 


Thinking 


Field 


Began 


Asian 


29% 


15% 


22% 


19% 


22% 


6% 




Not Asian 


28% 


19% 


25% 


23% 


25% 


6% 


Now 


Asian 


40% 


48% 


52% 


59% 


65% 


69% 




Not Asian 


45% 


62% 


67% 


77% 


82% 


81% 


Gain 


Asian 


+11% 


+33% 


+31% 


+41% 


+44% 


+63% 




Not Asian 


+16% 


+43% 


+42% 


+54% 


+57% 


+74% 


Diflaaicein Gan 


5% 


10% 


11% 


13% 


13% 


12% 


IMVIGRANT 




Quant 


Oral 


Witing 


Reading 


Thinking 


Field 


Began 


Immigrant 

Not 


28% 


17% 


19% 


18% 


19% 


6% 




Immigrant 


30% 


17% 


27% 


24% 


28% 


6% 


Now 


Immigrant 

Not 


42% 


57% 


57% 


67% 


72% 


73% 




Immigrant 


43% 


53% 


62% 


70% 


76% 


7 7% 


Gain 


Immigrant 

Not 


+14% 


+40% 


+38% 


+49% 


+52% 


+66% 




Immigrant 


+13% 


+35% 


+35% 


+45% 


+49% 


+71% 


DHl^aiceinGain! 


-1% 


-5% 


-3% 


-3% 


-3% 


5% 


GEM3ER 




Quant 


Oral 


Witing 


Reading 


Thinking 


Field 


Began 


Male 


31% 


15% 


22% 


20% 


26% 


7% 




Female 


26% 


19% 


24% 


22% 


21% 


5% 


Now 


Male 


49% 


54% 


59% 


68% 


77% 


76% 




Female 


36% 


56% 


60% 


69% 


71% 


74% 


Gain 


Male 


+17% 


+38% 


+37% 


+48% 


+51% 


+69% 




Female 


+10% 


+37% 


+36% 


+47% 


+50% 


+69% 


DHl^aiceinGain! 


-8% 


-1% 


0% 


-1% 


-1% 


0% 



Table 5. Percent Seniors Rating Current Skills 
as “Very Good” or “Excellent” by Ethnicity, Field 
of Study, and UC GPA 



SPECIFIC FIELD OF STUDY 



GPA < 3.2 



GPA < 3.2 



GPA < 3.2 





Asian 




Not Asian 


Science 


Not Science 


Science Not Science 


62% 


63% 


76% 


78% 


76% 


75% 


85% 


83% 


ANALYTICAL AND CRITICAL THINKING 




Asian 




Not Asian 


Science 


Not Science 


Science Not Science 


57% 


62% 


76% 


82% 


68% 


75% 


84% 


87% 




READING 








Aslan 




Not Asian 


Science 


Not Science 


Science Not Science 


51% 


57% 


70% 


76% 


61% 


69% 


77% 


84% 




WRITING 








Asian 




Not Asian 


Science 


Not Science 


Science Not Science 


43% 


55% 


56% 


73% 


43% 


68% 


58% 


80% 




ORAL PRESENTATION 






Asian 




Not Asian 


Science 


Not Science 


Science 


Not Science 


45% 


46% 


60% 


62% 


50% 


50% 


63% 


63% 




QUANTITATIVE SKILLS 






Asian 




Not Asian 


Science 


Not Science 


Science 


Not Science 


47% 


25% 


60% 


27% 


60% 


29% 


65% 


26% 



Immigrant 43% 53% 62 % 70% 76% 77% Tablo 6 illustratos what Univotsity of California 

Gain +38% +49% +52% +66% campus difforoncos iook iike and why the 

Immigrant +13% +35% +35% +45% +49% +71% dispiay of such difforoncos without further 

HffaaicfiinGain! - 1 % -5% -3% -3% -3% 5% anaiysis is misieading. As can be seen in the 

GEMOER quant 0,31 Reading TWrldng Field tOP P^nei, tWO UniVOrsity Of CaiifOmia 

Began Male 31% 15% 22% 20% 26% 7% campusos are ciear outiiers or “winners” with 

Female 26% 19% 24% 22% 21% 5% higher percentages of their seniors rating 

themseives as skiiifui or proficient than at 

Female 36% 56% 60% 69% 71% 74% , ^ 

Gain Male +17% +38% +37% +48% +51% +69% otner campuses. 

Female +10% +37% +36% +47% +50% +69% 

iMaaicfiinGain! - 8 % -1% 0% -1% -1% 0% Howovor, simpiy adjusting for two broad 

differences in campus composition, Asian vs. 

non-Asian and science vs. non-science, 

eiiminates entireiy the apparent advantage of one of the campuses and substantiaiiy reduces it for the 
other. (Additionai controis, e.g., for socioeconomic composition, wouid iikeiy eiiminate entireiy the 
advantage in the second case.) Being abie to demonstrate this provides a very practicai appiication of our 
initiai research findings on the sociai context of student seif-ratings at the University of Caiifornia. 





Thomson and Douglass: DECODING LEARNING GAINS 



11 



F. Conclusion 

Compared to the Collegiate Learning Assessment (CLA) 
and the Nationai Survey of Student Engagement (NSSE), 

SERU/UCUES appears the better approach in 
addressing the need for greater accountabiiity for 
assessing and reporting iearning outcomes in higher 
education. But the exampie of the apparent differences in 
iearning outcomes across the undergraduate campuses 
of the University of Caiifornia iiiustrates the obvious 
pitfaiis and iimitations of the seif-report data. 

Though tempting, we cannot accept seif-reports of 
iearning and educationai outcomes at face vaiue. The 
UCUES/SERU data have aii the probiems of upward bias 
(sociai desirabiiity, “haio” effect, etc.) inherent in seif- 
reported data in institutionai research (Gonyea, 2005). 

The probiem is compounded by the fact that we aiready 
have evidence that the extent of bias is not uniform, i.e., 
the observed differences between Asian and non-Asian 
respondents. 

On the other hand, these data, and the fact that they can 
be reiated to the extensive academic engagement data 
aiso coiiected on the SERU/UCUES survey as weii as to 
the range of demographic and institutionai data aiso 
avaiiabie, offers an unprecedented opportunity to 
advance our understanding of the nature of seif-reported 
iearning outcomes in higher education and the extent to 
which these reports can to contribute as indirect but vaiid 
measures of positive educationai outcomes at the 
research university. 

Our efforts here shouid be informed by the foiiowing: 

(1) Whiie the UCUE/SERU data are coiiected for entire 
campuses, the unique vaiue of the census design is our 
abiiity to “driii down” to individuai academic departments, 
student subpopuiations, and other fine-grained “units of anaiysis.” In examining patterns of iearning 
outcomes, it wiii be particuiariy usefui to do so at the ievei of student major (Chatman, 2007) and to 
provide departments the abiiity to “trianguiate” discipiinary-specific direct measures of iearning with the 
cost-effective externaiiy generated SERU/UCUES survey data. 

(2) Used properiy, the extensive SERU/UCUES student seif-reported indirect measures of iearning 
outcomes shouid encourage greater attention to direct 
measures of student iearning, not serve as a substitute for 
such measures. SERU/UCUES demonstrates that 
extensive individuai student data can be coiiected 
eiectronicaiiy reiativeiy inexpensiveiy no matter how iarge 
the university. Large-scaie use of eiectronic portfoiios may 
be more feasibie than generaiiy thought (Banta, 2009). 

(3) Converseiy, “iowest common denominator” caicuiations 
of iearning gains, such as deriving giobai outcome measures for an entire campus, especiaiiy without 
adjustment for student characteristics and compositionai effects, wiii be iess heipfui, especiaiiy for 
encouraging campus seif-improvement. In the Voluntary System of Accountability (VSA) and elsewhere, 
it is precisely these kinds of global measures that are used, even though we know that such measures 
can be very misleading. 



The time has come for institutional 
researchers and analysts at the University 
of California to take full advantage of the 
tremendous amount of retrospective 
pretest data on educational outcomes that 
we have available from SERU/UCUES. 



Table 6. Percent Rating Current Skills as 
"Very Good" or "Excellent" by Individual 
University of California Campus Before 
and After Adjusting for Differences 
in Asian versus Non-Asian and Science 
versus Non-Science Composition. 



CRITICAL THINKING 



Campus 


Unadjusted 


Adjusted 


Change 


A 


72% 


71% 


-1% 


B 


73% 


73% 


0% 


C 


73% 


72% 


-1% 


D 


73% 


71% 


-2% 


E 


75% 


74% 


-1% 


F 


77% 


77% 


0% 


G 


83% 


74% 


-9% 


H 


84% 


78% 


-6% 



WRITING CLEARLY 



Campus 


Unadjusted 


Adjusted 


Change 


A 


60% 


59% 


-1% 


B 


56% 


56% 


0% 


C 


67% 


65% 


-2% 


D 


59% 


58% 


-1% 


E 


60% 


59% 


-1% 


F 


61% 


59% 


-2% 


G 


72% 


62% 


-10% 


H 


72% 


61% 


-11% 



FIELD OF STUDY 



Campus 


Unadjusted 


Adjusted 


Change 


A 


75% 


75% 


0% 


B 


74% 


74% 


0% 


C 


74% 


74% 


0% 


D 


76% 


75% 


-1% 


E 


74% 


75% 


1% 


F 


75% 


75% 


0% 


G 


83% 


75% 


-8% 


H 


84% 


81% 


-3% 





Thomson and Douglass: DECODING LEARNING GAINS 



12 



Conventionally, of course, with the use of sampie surveys such as NSSE, oniy institution-wide statistics 
are avaiiabie. SERU/UCUES offers the possibiiity of a different metric or unit of anaiysis, one that is 
predicated on institutionai seif-improvement. For exampie, of the 25 iargest departments at a research 
university, how many have student ratings that meet a certain criterion? How many have demonstrated 
improvement in iearning gains, as reported by their majors? The focus, in other words, wouid be at a 
ievei that is interpretabie and more amenabie to change. 

Our conciusion; The time has come for institutionai researchers and anaiysts at the University of 
Caiifornia to take fuii advantage of the tremendous amount of retrospective pretest data on educationai 
outcomes that we have avaiiabie from SERU/UCUES. We shouid extend our inquiry to the fuii array of 21 
educationai outcome items in the core, examine the data across the fuii range of undergraduate cohorts 
and subpopuiations of interest, and identify and encourage any number of more focused “vaiidity” studies. 

We are optimistic that such efforts wiii significantiy advance our understanding of educationai outcomes 
and heip faciiitate the improvement of teaching and iearning at the research university. To this we shouid 
be heid accountabie. 



REFERENCES 

Adeiman, C. (2006). Border biind side. Education Week, 26 (11), November 8. 

Anaya, G. (1999). Coiiege impact on student iearning: Comparing the use of seif-reported gains, 
standardized test scores, and coiiege grades. Research in Higher Education, 40, 499-526. 

Arum, R. & Roska, J. (2008). Learning to reason and communicate in coiiege: Initiai report of findings 
from the iongitudinai CLA study. Sociai Science Research Councii, New York NY 

Banta, T. (2006) Reiiving the history of iarge-scaie assessment in higher education. Assessment Update, 
18 (4), 3-4, 15. 

Banta, T. (2007) A warning on measuring iearning outcomes. Inside Higher Education, January 26.; 
Found at: http://www.insidehighered.eom/views/2007/01/26/banta 

Banta, T. (2009). Assessment for improvement and accountability. Provost's Forum on the Campus 
Learning Environment, University of Michigan, February 4, 2009. 

Beicheir, M. J. (2001). What predicts perceived gains in iearning and in satisfaction? Report No. BSU-RR- 
2001-02). Boise, ID: Office of Institutionai Advancement (ERIC Document Reproduction Service 
No. ED480921). 

Beicheir, M. J. (2003). Student academic and personai growth whiie at Boise State: A summary of 2002 
Nationai Survey of Student Engagement findings. (Report No. BSU-RR-2003-03). Boise, ID: 
Office of Institutionai Advancement (ERIC Document Reproduction Service No. Number 
ED480934). 

Bowman, N. A. (2009). Can first-year coiiege students provide accurate seif-reports about their iearning 
and deveiopment? Unpubiished manuscript. University of Notre Dame. 

Braun, H. (2008). Viccissitudes of the Vaiidators. 2008 Reidy Interactive Lectures Series, Portsmouth, 
NH. 

Brint, S, Cantweii, A. M., & Hanneman, R. (2008). The two cuitures of undergraduate academic 
engagement. Research in Higher Education, 49(5), 383-402. 



Carini, R. M., Kuh, G. D., & Kiein, S. P. (2006). Student engagement and student iearning: Testing the 
iinkages. Research in Higher Education, 47 (1 ), 1 -32. 





Thomson and Douglass: DECODING LEARNING GAINS 



13 



Chatman, S. (2007). Institutional Versus Academic Discipline Measures of Student Experience: A Matter 
of Relative Validity. Center for Studies in Higher Education, University of California, Berkeley. 

Consortium on Financing Higher Education (COFHE) (2008). Assessment: A Fundamental Responsibility. 
Found at: http://www.assessmentstatement.org/index_files/Pago71 7.htm 

Douglass, J.A., Roebken, H. & Thomson, G. (2007). The immigrant university: Assessing the dynamics 
of race, major and socioeconomic characteristics at the University of California. Center for 
Studies in Higher Education, University of California, Berkeley 

Douglass, J.A. and Thomson, G. (2008). The poor and the rich: A look at economic stratification and 
academic performance among undergraduate students in the United States. Center for Studies 
in Higher Education, University of California, Berkeley. 

Gonyea, R. M. (2005). Self-reported data in institutional research: Review and recommendations. In P. 
D. Umbach (Ed.), New Directions for Institutional Research, 127 (Fall), 73-89. San Francisco: 
Jossey-Bass. 

Hill, L. G., & Betz, D. I. (2005). Revisiting the retrospective pretest. American Journal of Evaluation, 26 
(4), 501-517. 

Howard, G. S. (1980). Response-shift bias: A problem in evaluating interventions with pre/post self- 
reports. Evaluation Review, 4, 93-106. 

Howard, G. S. & Dailey, P. R. (1979). Response-shift bias: A source of contamination of self-report 
measures. Journal of Applied Psychology, 64, 144-150. 

Howard, G. S., Ralph, K. M., Gulanick, N. A., Maxwell, S. E., Nance, D. W., & Gerber, S. K. (1979). 
Internal invalidity in pretest-postest self-report evaluations and a re-evaluation of retrospective 
pretests. Applied Psychological Measurement, 3, 1-23. 

Klein, S., Benjamin, R., and Shavelson, R. (2007). The Collegiate Learning Assessment: Facts and 
fantasies. Evaluation Review, 31 (5), 415-439. 

Klein, S., Freedman, D., Shavelson, R., & Bolus, R. (2008). Assessing school effectiveness. Evaluation 
Review, 32 (6), 511-525. 

Klein, S., Kuh, G., Chun, M, Hamilton, L. & Shavelson, R., (2005). An approach to measuring cognitive 
outcomes across higher education Institutions. Research in Higher Education, 46 (3), 251-276. 

Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures 
in surveys. Applied Cognitive Psychology, 5, 213-236. 

Lam, T. C. M. & Benge, P. (2003). A comparison of three retrospective self-reporting methods of 
measuring change in instructional practice. American Journal of Evaluation, 24 (1), 65-80. 

National Survey of Student Engagement (2008a). Frequency Distributions: 2008. Indiana University 
Center for Postsecondary Research. Bloomington, IN 

National Survey of Student Engagement (2008b). Promoting engagement for all students: The imperative 
to look within. 2008 Results. Indiana University Center for Postsesondary Research. 
Bloomington, IN 

Pace, C. R. (1979). Measuring the Outcomes of College. San Francisco: Jossey-Bass 

Pace, C. R. (1 984). Measuring the quality of college student experience: An account of the development 
and use of the College Student Experiences Questionnaire. Los Angeles: Higher Education 
Research Institute. 




Thomson and Douglass: DECODING LEARNING GAINS 



14 



Pascarella, E. T. (2001 ). Using student self-reported gains to estimate college impact: A cautionary tale. 
Journal of College Student Development, 42, 488-492. 

Pike, G. R. (1993). The relationship between perceived learning and satisfaction in college: An 
alternative view. Research In Higher Education, 34, 23-40. 

Pike, G. R. (1999). The constant error of the halo in educational outcomes research. Research In Higher 
Education 40, 61-86. 

Pike, G. R. (2006) Value-added measures and the Collegiate Learning Assessment. Assessment Update, 
18 (4), 5-7. 

Shulman, L. S. (2007) Counting and recounting: Assessment and the quest for accountability. Change. 
January-February. 

Spelling Commission on the Future of Higher Education (2006), A Test of Leadership: Charting the Future 
ofU.S. Higher Education, US Department of Education, September 26, 2006. 

Taylor, P.T., Russ-Eft, D. F., & Taylor, H. (2009). Gilding the outcome by tarnishing the past. American 
Journal of Evaluation, 30 (1 ), 31 -43. 

Thomson, G. (2006). New developments in the assessment of student development and proficiences. 
Paper presented at the annual meeting of ACPA, Indianapolis, IN. 

Wells, F. (1907). A statistical study of literary merit. Archives of Psychology, 7, 5-30. 

Wilson, D. B., & Lipsey, M. W. (2001). The role of method in treatment effectiveness research: Evidence 
from meta-analysis. Psychological Methods, 6, 413-429. 



NOTES 



^ The Spellings Commission was announced on September 19, 2005, by U.S. Secretary of Education Margaret 
Spellings. The nineteen-member Commission was charged with recommending a national strategy for reforming 
post-secondary education, with a particular focus on how well colleges and universities are preparing students for the 
21st-century workplace, as well as a secondary focus on how well high schools are preparing the students for post- 
secondary education. In the report, released on September 26, 2006, the Commission focuses on four key areas: 
access, affordability (particularly for non-traditional students), the standards of quality in instruction, and the 
accountability of institutions of higher learning to their constituencies (students, families, taxpayers, and other 
investors in higher education). 

^ UC President Robert C. Dynes quoted in Scott Jaschik, “Accountability System Launched,” Inside Higher Education, 
Nov. 12, 2007 

^ Speech before the National Press Club, report In The Chronicle of Higher Education, Feb. 1 , 2008. 

These Include Florida, Michigan, Minnesota, Pittsburgh, Rutgers, and Oregon. 

^ For the student experiences and perceptions category of the VSA, participating institutions are required to report 
data from one of four surveys: the College Student Experiences Questionnaire, the College Senior Survey, the 
National Survey of Student Engagement, or the SERU Survey (or what is known in the UC system as the University 
of California Undergraduate Experience Survey). 



